Efficient Retrieval of Partial Documents

نویسندگان

  • Justin Zobel
  • Alistair Moffat
  • Ross Wilkinson
  • Ron Sacks-Davis
چکیده

Management and retrieval of large volumes of text can be expensive in both space and time. Moreover, the range of document sizes in a large collection such as trec presents difficulties for both the retrieval mechanism and the user. We consider division of documents into parts as a solution to the problem of the range of document sizes and show that, for databases with long documents, use of document parts can improve the quality of the information presented to the user. We also describe the compressed text database system we use to manage the trec collection; the compressed inverted files with which it is indexed; and the techniques we use to evaluate the trec queries, both on whole documents and on document parts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Algorithm for Extracting Similar Partial Utterances toward Flexible Spoken Document Retrieval

This paper proposes a new approach for spoken document retrieval by extracting similar partial utterances for non-segmented and non-recognized data; presentation speech, lecture speech or recorded video, and so on. For this purpose, we propose a new, efficient algorithm that performs fast matching between arbitrary sections of the database and arbitrary sections of query input. It enables searc...

متن کامل

Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques

Documents Using Information Retrieval Techniques Young-Ho Park1, Kyu-Young Whang1, Byung Suk Lee2, and Wook-Shin Han3 1 Department of Computer Science and Advanced Information Technology Research Center (AITrc)?? Korea Advanced Institute of Science and Technology (KAIST), Korea fyhpark, [email protected] 2 Department of Computer Science University of Vermont Burlington, VT, USA bslee@...

متن کامل

Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques

We propose XIR-Linear, a method for efficiently evaluating linear path expressions (LPEs) on large-scale heterogeneous XML documents using information retrieval (IR) techniques. LPEs are the primary form of XPath queries, and their evaluation techniques have been researched actively. XPath queries in their general form are partial match queries, and these queries are particularly useful for sea...

متن کامل

Evaluating bias in retrieval systems for recall oriented documents retrieval

The evaluation of a retrieval system has always been the focus of research. Most of the retrieval systems seem to be more efficient for precision oriented documents than recall oriented documents since there is a difference between both the recall and precision oriented documents. Therefore, a system that is efficient for the retrieval of precision oriented documents does not need to be good fo...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

Issues of Real Time Information Retrieval in Large, Dynamic and Heterogeneous Search Spaces

Increasing size and prevalence of real time information have become important characteristics of databases found on the internet. Due to changing information, the relevancy ranking of the search results also changes. Current methods in information retrieval, which are based on offline indexing, are not efficient in such dynamic search spaces and cannot quickly provide the most current results. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 31  شماره 

صفحات  -

تاریخ انتشار 1995